solution length
Subgoal-Guided Policy Heuristic Search with Learned Subgoals
Tuero, Jake, Buro, Michael, Lelis, Levi H. S.
Policy tree search is a family of tree search algorithms that use a policy to guide the search. These algorithms provide guarantees on the number of expansions required to solve a given problem that are based on the quality of the policy. While these algorithms have shown promising results, the process in which they are trained requires complete solution trajectories to train the policy. Search trajectories are obtained during a trial-and-error search process. When the training problem instances are hard, learning can be prohibitively costly, especially when starting from a randomly initialized policy. As a result, search samples are wasted in failed attempts to solve these hard instances. This paper introduces a novel method for learning subgoal-based policies for policy tree search algorithms. The subgoals and policies conditioned on subgoals are learned from the trees that the search expands while attempting to solve problems, including the search trees of failed attempts. We empirically show that our policy formulation and training method improve the sample efficiency of learning a policy and heuristic function in this online setting.
The Virtues of Brevity: Avoid Overthinking in Parallel Test-Time Reasoning
Dinardi, Raul Cavalcante, Yamamoto, Bruno, Costa, Anna Helena Reali, Jordao, Artur
Reasoning models represent a significant advance in LLM capabilities, particularly for complex reasoning tasks such as mathematics and coding. Previous studies confirm that parallel test-time compute-sampling multiple solutions and selecting the best one-can further enhance the predictive performance of LLMs. However, strategies in this area often require complex scoring, thus increasing computational cost and complexity. In this work, we demonstrate that the simple and counterintuitive heuristic of selecting the shortest solution is highly effective. We posit that the observed effectiveness stems from models operating in two distinct regimes: a concise, confident conventional regime and a verbose overthinking regime characterized by uncertainty, and we show evidence of a critical point where the overthinking regime begins to be significant. By selecting the shortest answer, the heuristic preferentially samples from the conventional regime. We confirm that this approach is competitive with more complex methods such as self-consistency across two challenging benchmarks while significantly reducing computational overhead. The shortest-answer heuristic provides a Pareto improvement over self-consistency and applies even to tasks where output equality is not well defined.
A Machine Learning Approach That Beats Large Rubik's Cubes
Chervov, Alexander, Khoruzhii, Kirill, Bukhal, Nikita, Naghiyev, Jalal, Zamkovoy, Vladislav, Koltsov, Ivan, Cheldieva, Lyudmila, Sychev, Arsenii, Lenin, Arsenii, Obozov, Mark, Urvanov, Egor, Romanov, Alexey
The paper proposes a novel machine learning-based approach to the pathfinding problem on extremely large graphs. This method leverages diffusion distance estimation via a neural network and uses beam search for pathfinding. We demonstrate its efficiency by finding solutions for 4x4x4 and 5x5x5 Rubik's cubes with unprecedentedly short solution lengths, outperforming all available solvers and introducing the first machine learning solver beyond the 3x3x3 case. In particular, it surpasses every single case of the combined best results in the Kaggle Santa 2023 challenge, which involved over 1,000 teams. For the 3x3x3 Rubik's cube, our approach achieves an optimality rate exceeding 98%, matching the performance of task-specific solvers and significantly outperforming prior solutions such as DeepCubeA (60.3%) and EfficientCube (69.6%). Additionally, our solution is more than 26 times faster in solving 3x3x3 Rubik's cubes while requiring up to 18.5 times less model training time than the most efficient state-of-the-art competitor.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > France (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (4 more...)
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Zeng, Zhiyuan, Cheng, Qinyuan, Yin, Zhangyue, Zhou, Yunhua, Qiu, Xipeng
The advent of test-time scaling in large language models (LLMs), exemplified by OpenAI's o1 series, has advanced reasoning capabilities by scaling computational resource allocation during inference. While successors like QwQ, Deepseek-R1 (R1) and LIMO replicate these advancements, whether these models truly possess test-time scaling capabilities remains underexplored. This study found that longer CoTs of these o1-like models do not consistently enhance accuracy; in fact, correct solutions are often shorter than incorrect ones for the same questions. Further investigation shows this phenomenon is closely related to models' self-revision capabilities - longer CoTs contain more self-revisions, which often lead to performance degradation. We then compare sequential and parallel scaling strategies on QwQ, R1 and LIMO, finding that parallel scaling achieves better coverage and scalability. Based on these insights, we propose "Shortest Majority Vote", a method that combines parallel scaling strategies with CoT length characteristics, significantly improving models' test-time scalability compared to conventional majority voting approaches.
- Europe > Austria > Vienna (0.15)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (7 more...)
Effective Sampling for Robot Motion Planning Through the Lens of Lattices
Panasoff, Itai, Solovey, Kiril
Sampling-based methods for motion planning, which capture the structure of the robot's free space via (typically random) sampling, have gained popularity due to their scalability, simplicity, and for offering global guarantees, such as probabilistic completeness and asymptotic optimality. Unfortunately, the practicality of those guarantees remains limited as they do not provide insights into the behavior of motion planners for a finite number of samples (i.e., a finite running time). In this work, we harness lattice theory and the concept of $(\delta,\epsilon)$-completeness by Tsao et al. (2020) to construct deterministic sample sets that endow their planners with strong finite-time guarantees while minimizing running time. In particular, we introduce a highly-efficient deterministic sampling approach based on the $A_d^*$ lattice, which is the best-known geometric covering in dimensions $\leq 21$. Using our new sampling approach, we obtain at least an order-of-magnitude speedup over existing deterministic and uniform random sampling methods for complex motion-planning problems. Overall, our work provides deep mathematical insights while advancing the practical applicability of sampling-based motion planning.
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Research Report (0.64)
- Overview (0.46)
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Luo, Haotian, Shen, Li, He, Haiying, Wang, Yibo, Liu, Shiwei, Li, Wei, Tan, Naiqiang, Cao, Xiaochun, Tao, Dacheng
Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge. Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
Level Generation Through Large Language Models
Todd, Graham, Earle, Sam, Nasir, Muhammad Umair, Green, Michael Cerny, Togelius, Julian
Large Language Models (LLMs) are powerful tools, capable of leveraging their training on natural language to write stories, generate code, and answer questions. But can they generate functional video game levels? Game levels, with their complex functional constraints and spatial relationships in more than one dimension, are very different from the kinds of data an LLM typically sees during training. Datasets of game levels are also hard to come by, potentially taxing the abilities of these data-hungry models. We investigate the use of LLMs to generate levels for the game Sokoban, finding that LLMs are indeed capable of doing so, and that their performance scales dramatically with dataset size. We also perform preliminary experiments on controlling LLM level generators and discuss promising areas for future work.
- Europe > Portugal > Lisbon > Lisbon (0.05)
- North America > United States > New York > Kings County > New York City (0.05)
- Oceania > Australia > Queensland > Cairns Region > Cairns (0.04)
- (2 more...)
Start Small: Training Controllable Game Level Generators without Training Data by Learning at Multiple Sizes
Zakaria, Yahia, Fayek, Magda, Hadhoud, Mayada
A level generator is a tool that generates game levels from noise. Training a generator without a dataset suffers from feedback sparsity, since it is unlikely to generate a playable level via random exploration. A common solution is shaped rewards, which guides the generator to achieve subgoals towards level playability, but they consume effort to design and require game-specific domain knowledge. This paper proposes a novel approach to train generators without datasets or shaped rewards by learning at multiple level sizes starting from small sizes and up to the desired sizes. The denser feedback at small sizes negates the need for shaped rewards. Additionally, the generators learn to build levels at various sizes, including sizes they were not trained for. We apply our approach to train recurrent auto-regressive generative flow networks (GFlowNets) for controllable level generation. We also adapt diversity sampling to be compatible with GFlowNets. The results show that our generators create diverse playable levels at various sizes for Sokoban, Zelda, and Danger Dave. When compared with controllable reinforcement learning level generators for Sokoban, the results show that our generators achieve better controllability and competitive diversity, while being 9x faster at training and level generation.
- North America > United States > New York > New York County > New York City (0.04)
- Africa > Middle East > Egypt > Giza Governorate > Giza (0.04)
- Europe > United Kingdom > Scotland > City of Dundee > Dundee (0.04)
- Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
- Research Report > New Finding (0.87)
- Research Report > Experimental Study (0.68)
The Fifteen Puzzle- A New Approach through Hybridizing Three Heuristics Methods
Hasan, Dler O., Aladdin, Aso M., Talabani, Hardi Sabah, Rashid, Tarik Ahmed, Mirjalili, Seyedali
Fifteen Puzzle problem is one of the most classical problems that have captivated mathematical enthusiasts for centuries. This is mainly because of the huge size of the state space with approximately 1013 states that have to be explored and several algorithms have been applied to solve the Fifteen Puzzle instances. In this paper, to deal with this large state space, Bidirectional A* (BA*) search algorithm with three heuristics, such as Manhattan distance (MD), linear conflict (LC), and walking distance (WD) has been used to solve the Fifteen Puzzle problems. The three mentioned heuristics will be hybridized in a way that can dramatically reduce the number of generated states by the algorithm. Moreover, all those heuristics require only 25KB of storage but help the algorithm effectively reduce the number of generated states and expand fewer nodes. Our implementation of BA* search can significantly reduce the space complexity, and guarantee either optimal or near-optimal solutions.1
- Asia > Middle East > Iraq > Erbil Governorate > Erbil (0.04)
- Oceania > Australia > South Australia > Adelaide (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (3 more...)
The Impact of Visualizing Design Gradients for Human Designers
Guzdial, Matthew, Sturtevant, Nathan, Yang, Carolyn
Mixed-initiative Procedural Content Generation (PCG) refers to tools or systems in which a human designer works with an algorithm to produce game content. This area of research remains relatively under-explored, with the majority of mixed-initiative PCG level design systems using a common set of search-based PCG algorithms. In this paper, we introduce a mixed-initiative tool employing Exhaustive PCG (EPCG) for puzzle level design to further explore mixed-initiative PCG. We run an online human subject study in which individuals use the tool with an EPCG component turned on or off. Our analysis of the results demonstrates that, although a majority of users did not prefer the tool, it made the level design process significantly easier, and that the tool impacted the subjects' design process. This paper describes the study results and draws lessons for mixed-initiative PCG tool design.
- Research Report > New Finding (0.88)
- Research Report > Experimental Study (0.69)